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MULTIPLIER-FREE METHODS AND APPARATUS FOR SIGNAL PROCESSING 



IN A DIGITAL COMMUNICATION SYSTEM 



Field of the Invention 

5 The invention relates generally to digital communication systems and, more particularly, to 

signal processing operations, such as filtering, channel estimation, and channel equaUzation, for use 
in such systems. 

Background of the Invention 

1 0 Channel estimation and equalization are important determinants of the quality of achieved 

data throughput in a digital communication system receiver. In conventional receivers, channel 
estimation is often performed using a technique known as Least-Mean-Squares (LMS) estimation, 
while channel equalization is performed using Maximum-Likelihood (ML) sequence detection via 
the Viterbi algorithm. A problem with these existing techniques is that, in their original form, they 

15 generally require many complex multiplication operations. More particularly, LMS estimation 
requires 2M multiplications, and a full-search Viterbi algorithm requires PMmultiplications, where 
Mis the order of the channel estimator, and P is the size of the symbol alphabet. The large number 
of multiplications may render these conventional estimation and equalization techniques 
prohibitively expensive in terms of the required computational resources, particularly at very high 

20 data rates. 

Subsequent implementations of the LMS estimation technique have attempted to reduce the 
required number of multiphcations through the use of signed approximations of a regression vector 
and/or an error signal, as described in, e.g., T.A.C.M. Claasen and W.F.G. Mecklenbrauker, 
"Comparison of the convergence of two algorithms for adaptive FIR digital filters," IEEE Trans, on 

25 Acoustics, Speech and Signal Processing, Vol. ASSP-29, No. 3, pp. 670-678, June 1981; DX. 
Duttweiler, "A twelve-channel digital echo canceler," IEEE Trans, on Communications, Vol. COM- 
26, No. 5, May 1978; and R.D. Gitlin, I.E. Mazo and M.G. Taylor, "On the design of gradient 
algorithms for digitally implemented adaptive filters," IEEE Trans, on Circuit Theory, Vol. 20, No. 
2, pp. 125-136,Mar. 1973. These non-linear methods, however, can alter the training behavior such 

30 that the training speed is considerably reduced. As a result, these methods are typically suitable for 
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use only in applications in which long training sequences are available, e.g., broadcasting 
applications. 

It has also been proposed that pipelining be introduced in order to increase the throughput 
of the implemented hardware, as described in, e.g., M.D. Meyer and D.P. Agrawal, "A modular 
5 pipelined implementation of a delayed LMS transversal adaptive filter," Proc. of ISC AS, New 
Orleans, pp. 1943-1946, 1990. A pipelining technique, however, leads to delayed updates that also 
corrupt the learning behavior of the LMS algorithm. See, e.g., P. Kabal, "The stability of adaptive 
minimum mean square error equaUzers using delayed adjustment," IEEE Trans, on 
Communications, Vol. COM-31, No. 3, pp. 430-432, Mar. 1983; G. Long, F. Ling and J.A, Proakis, 

1 0 "The LMS algorithm with delayed coefficient adaptation " IEEE Trans, on Acoustics, Speech and 
Signal Processing, Vol. ASSP-37,No. 9,pp. 1397-1405, Sept. 1989; and G. Long, F. Ling and J.A. 
Proakis, "Corrections to 'The LMS algorithm with delayed coefficient adaptation'," IEEE Trans, 
on Signal Processing, Vol. SP-40, No. 1, pp. 230-232, Jan. 1992. 

Although a number of techniques have been developed to compensate for the above-noted 

15 corruption, e.g., as described in M. Rupp and R. Frenzel, "The behavior of LMS and NLMS 
algorithms with delayed coefficient update in the presence of spherically invariant processes," IEEE 
Trans, on Signal Processing, Vol. SP-42, No. 3, pp. 668-672, March 1 994; and E, Bjamason, "Noise 
cancellation using a modified form of the filtered-XLMS algorithm," Proc. Eusipco Signal 
Processing V, Briissel, pp. 1053-1056, 1992, such techniques often require even more 

20 multiphcations. Straightforward realizations of delayed-update LMS with compensation are 
described in T. Kimijima, K, Nishikawa and H. Kiya, "A pipelined architecture for DLMS algorithm 
considering both hardware complexity and output latency," Proc. Eusipco, Patras, Greece, pp. 
503-506, Sep. 1998. 

As is apparent from the above, a need exists for improved channel estimation, channel 
25 equahzation and other signal processing techniques, which can eliminate or substantially reduce the 
required number of multiplications, without significantly altering training behavior, numerical 
precision or other desirable attributes of the corresponding algorithms. 
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Summary of the Invention 

The present invention provides efficient computational techniques suitable for use in channel 
estimation, channel equalization and other signal processing operations. In accordance with the 
invention, signal processing operations are performed in a digital communication system receiver 

5 on a sequence of received symbols, each representing a number of information bits. The symbols 
correspond to points in a given modulation constellation generated by applying a predetermined 
rotation, e.g., a 45° rotation, to an otherwise conventional modulation constellation, e.g., a QPSK 
constellation, a 1 6-QAM constellation, a 64-QAM constellation, a 25 6-QAM constellation, a 1024- 
QAM constellation, etc. The use of the rotated constellation allows certain signal processing 

1 0 operations, such as Finite Impulse Response (FIR) filtering, Least-Mean-Squares (LMS) estimation, 
and Maximum-Likelihood (ML) sequence detection via the Viterbi algorithm, to be performed 
without the need for multipUers. By eliminating or substantially reducing the number of required 
multipUers, the invention significantly reduces the complexity and delay associated with the 
corresponding signal processing circuitry. 

15 In an illustrative embodiment of the invention, the signal processing operation utilizes a 

selector to implement a complex midtiplication of a channel estimate coefficient with a symbol fi:om 
a given modulation constellation. The selector receives as inputs real and imaginary parts of an 
element of the channel estimate coefficient, and generates as outputs real and imaginary parts of a 
product of the element of the channel estimate coefficient and a corresponding element of a given 

20 one of the symbols, without utilizing a multiphcation operation. The selector may include, e.g., first 
and second switches and first and second add/subtract units, the fu-st and second switches each 
selecting one of the real or the imaginary part of the element of the channel estimate coefficient for 
appUcation to a corresponding one of the add/subtract units, such that the add/subtract units compute 
elements of real and imaginary parts of an inner vector product. An FIR filter operation may be 

25 implemented using the selector by including feedback fl-om outputs of the add/subtract units to 
corresponding inputs of the add/subtract units. The selectors can be arranged in multi-stage and/or 
hierarchical adder tree structures in order to unplement the particular processing operations required 
in a given appUcation. 
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Advantageously, the invention can not only completely eliminate multiplications in certain 
embodiments, but also minimizes the number of addition operations required in implementing FIR 
fiUering and other operations prevalent in channel estimation and equalization algorithms. The 
invention can be embodied in channel estimation algorithms such as LMS estimation and 
5 equalization algorithms such as ML sequence detection using the Viterbi algorithm, and preserves 
the numerical precision of the algorithms while reducing their complexity dramatically. For 
example, an illustrative embodiment of the invention requires only selection operations, i.e., no 
adders or multipHers, to implement the multipUcation of a complex- valued filter tap coefficient and 
a QPSK constellation point. 

10 

Brief Description of the Drawings 

FIGS. 1(a) and 1(b) show exemplary conventional and rotated QPSK constellations, 
respectively. 

FIG. 2 is a table showing the mapping of two bits into a rotated QPSK constellation point 
15 in accordance with the invention. 

FIG. 3 shows a basic selector operation for implementing a complex multipUcation of a 
coefficient with a symbol from a QPSK constellation, in accordance with the invention. 

FIGS . 4 and 5 illustrate other structures for implementing filter operations in accordance with 
the invention, utihzing the basic selector operation of FIG. 3. 
20 FIG. 6 shows a recursive add/sub operation for complete vector product computation in 

accordance with the invention. 

FIGS. 7(a) through 7(f) illustrate the rotation of a 16-QAM constellation, and its separation 
into four subsets. 

FIG. 8 shows a two-step chain for multiplication in a 16-QAM set in accordance with the 
25 invention. 

FIG. 9 shows a three-step chain for multiplication in a 64-QAM set in accordance with the 
invention. 

FIG. 10 is a table showing the number of add/sub operations required to implement an FIR 
filter with M complex-valued taps, in accordance with the invention. 
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FIG. 1 1 shows an FIR filter chain with 64-QAM blocks for multiphcation in accordance with 
the invention. 

FIG. 12 is a table showing selection operations in accordance with the invention. 

FIG. 13 illustrates a basic add/sub structure for implementing an FIR filter, in accordance 

5 with the invention. 

FIG. 14 illustrates a first stage of an adder tree for implementing an FIR filter, in accordance 

with the invention. 

FIG. 15 is a table comparing, for different types of modulation, the number of required 
add/sub operations and the corresponding latency in an LMS technique in accordance with the 
10 invention. 

FIG. 16 illustrates an LMS update operation in accordance with the invention. 

FIG. 17 is a table comparing, for different types of modulation, the number of required 
add/sub operations per coefficient to initialize a treUis of the Viterbi algorithm in accordance with 
the invention. 

1 5 FIGS . 1 8 and 1 9 show examples of communication system receivers in which the techniques 

of the invention may be implemented. 

Detailed Description of the Invention 

The present invention will be described herein with reference to particular types of signal 
20 processing devices, such as FIR filters, channel estimators, channel equalizers and Viterbi decoders. 
It should be understood, however, that the invention is more generally appUcable to other types of 
signal processing devices and applications. 

1. Basic Rotated Constellation Approach 
The invention will first be illustrated with an example based on QPSK modulation. Rather 
25 than using a conventional constellation, as shown in FIG. 1(a), with the constellation points, e 
{1/ V2 (±1, ±/)}, the invention utilizes a rotated constellation with points u^e {1,7,-1, -7}. The 
resulting rotated constellation is shown in FIG. 1(b). Rotating the constellation points in accordance 
with the invention does not change the behavior of the digital modulation scheme nor its 
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transmission since any given hardware-dependent implementation as well as the transmission 
channel will typically add arbitrary rotations in any case. It has been noted in G.M. Durant and S. 
Ariyavisitakul, "Implementation ofabroadbandequalizerforhigh-speedwirelessdataapplications," 

Proc. ffiEE ICUPC 98, Florence, Italy, Oct. 1998, that using w, e (±1 ±j) rather than M; e 1 / V2 (±1 
5 ±j) saves complexity since multiplications can be performed as add/sub operations. However, the 
rotated constellation of FIG. 1(b) simpHfies this operation even further. To illustrate this, consider 
the implementation of an inner vector product, an operation commonly used in the above-noted LMS 
and Viterbi algorithms. Elements of a channel estimate, e.g., are combined in a row vector w = [Wj, 
Wj, w J, while transmitted modulated symbols are represented by a row vector u = [u^, Mj, "mI- 
1 0 The inner vector product to compute is thus given by 

M 
1=1 

with r denoting the transpose operation. In other words, a complex multiplication is required to 
multiply each transmitted symbol where i = \...M, with one of the channel estunate weights w,.. 
1 5 A complex multiplication usually requires four real multipUcations and two add/ subtract (add/ sub) 
operations. That is, 

= Real(M,)Real(w,.) - Imag(M,.)Imag(w,) +7{Real(M,)Imag(w,) + Imag(w,)Real(w,)}. 
20 Other reahzations with three multiplications and more add/sub operations are possible: 

A = Imag(M,.) X {Real(w,) + Imag(w,)} (1) 
B = Real(w) x {Real(M,) + Imag(M,)} (2) 

25 

C = Real(M,) X {Imag(w;) - Real(W;)} (3) 
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(4) 



If, however, taking the structure of the rotated QPSK constellation, «, e {1,;, -1, -j}, into account, 
the multiplications completely disappear, and only the following four cases remain: 

u, = I: u,Wi = Real(w,) +7lmag(w,.) (5) 

Ui = j ■ uiWi = - Imag(w,) +7Real(M;,.) (6) 

Ui = -I: = -Real(w,.) -ylmag(w,) (7) 

u, = -j : M,Wi = Iniag(w,.) -;-Real(>v,). (8) 

In other words, the multiplication becomes a selection operation. As will be described in greater 
detail below, even the add/sub operations are not necessary. 

FIG. 2 shows one possible mapping of pairs of bits bj and bo into the rotated QPSK 
constellation points e -1, -j} of FIG. 1(a). Arbitrary mappings, such as Gray coding, can 
be implemented by first applying an appropriate conversion mapping, followed by the FIG. 2 
mapping. 

FIG. 3 illustrates a basic selector structure for implementing the above-noted selection 
operation, i.e., the selection process for a QPSK modulation that leads to a complex multiplication 
according to the table of FIG. 2 and Equations (5) through (8) above. The basic selector structure 
includes inverters 10-1 and 10-2, and switches 12-1 and 12-2, interconnected as shown. As is 
apparent from FIG. 3, the selection operation does not require any multiplication or addition. 
Instead, a two's complement logic is assumed as well as some basic logic to control the selectors 
based on the values of bits bo and bj. In this selector structure, the inverters 10-1 and 10-2 are 
assumed to be active high. That is, inversion is performed if the control signal value is "1" and the 
input signal is passed through without inversion when the control signal value is "0." The switches 
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12-1 and 12-2 are assumed to be connected to the upper position when the control signal value is 
"1," and connected to the lower position when the control signal value is "0." 

FIGS. 4 and 5 show possible hardware realizations for an operation to compute an inner 
vector product of length M, utilizing the basic selector of FIG. 2. hi this case, addition operations 
5 cannot be completely eliminated. Assume that the partial sum is computed up to position {k - 1): 

Then, the next step to compute Sj^ is 

10 = s,_, + u,Wj^, k=2...M. (10) 

The FIG. 4 implementation includes the elements of the FIG. 3 selector as well as adders 14-1 and 
14-2 for adding the real and imaginary parts of to the real and imaginary parts, respectively, of 
w^w^. The FIG. 5 implementation eliminates the inverters 10-1 and 10-2 of the FIG. 3 selector by 
incorporating two's complement operations into add/sub units 16-1 and 16-2. Thus, apart from the 

15 switches 12-1 and 12-2 and other supporting logic, only two add/sub units are required if the sum 
is to be computed recursively. The add/sub units 16-1 and 16-2 are assumed to perform the addition 
operation when the value of their corresponding control signal is "V\ 

FIG. 6 shows a fully recursive structure that can perform a complete FIR operation in 
accordance with Equation (9) without requiring any additional hardware. The FIG. 6 structure 

20 corresponds to the FIG. 5 structure with the addition of feedback to supply the real and imaginary 
parts of 5^.1 to the corresponding inputs of the add/sub units 16-1 and 16-2, respectively. 

1.1 Extension to QAM Constellations 
The above-described techniques can be applied to larger signal constellations, as will now 
be described with reference to an example based on a 16-QAM constellation. FIG. 7(a) shows the 

25 16-QAM constellation, with a set of 16 constellation points, e{±l ±7, ±1 ±37, ±3 ±7, ±3 ± 3j}, 

8 
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In accordance with the invention, the FIG. 7(a) constellation is rotated by 45°, resulting in the 
rotated constellation as shown in FIG. 7(b). FIGS. 7(c) through 7(f) show four different subsections 
of the rotated constellation of FIG. 7(b). 

Each subsection in this example corresponds to a translated QPSK constellation. Note also 
that the center point of each subsection corresponds to a point in a rotated QPSK constellation. 
Thus, a first step involves the selection of the center point of a subsection, which is the same as 
selecting a point in a rotated QPSK constellation as described previously. In a second step, the 
actual signal is selected. This is another selection process, similar to the previous selection. Note, 
however, that in the first step the value corresponding to the center point of each subsection is twice 
as large as the correction signal that is to be added. Thus, before the second step, a shift operation 
is required. In other words, in order to implement a multiphcation with a symbol from a 16-QAM 
constellation, the corresponding channel weight is now required to be selected as described in 
Equations (5) through (8) above, followed by a shift operation, and finally added by another selected 
value, as illustrated in FIG. 8. 

The FIG. 8 structure includes a selector (SEL) 30, a left-shift (L-SH) element 32, and a 
recursive selector (REC) 34. The selector 30 in FIG. 8 corresponds to the selector shown m FIG. 
3, and the recursive selector 34 corresponds to the recursive structure of FIGS. 4 or 5. The left-shift 
element 30 implements the above-described left shift operation. 

An example of the processing implemented by the FIG. 8 structure is as follows. Consider 
the constellation point (1 + 3;) in the conventional 16-QAM constellation, which is now mapped into 
(2 + j) in the manner previously described. Note that the definition of one unit in actual 
implementation is arbitrary. In the first step, a multiply by 2 operation is performed using a selection 
operation implemented by selector 30, followed by a shift operation implemented by element 32. 
After that, the coefficient is multiphed by j, which is another selection operation. Finally, the two 
values obtained from the two steps are added together. This complex multiplication is implemented 
using two real add operations in the recursive selector 34. 

FIG. 9 shows the recursive structure for multiplying with a 64-QAM constellation point. 
Compared to the FIG. 8 recursive structure for the 16-QAM constellation, the structure for the 64- 
QAM constellation requires a three stage recursion, and thus an additional left shift element 36 and 
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recursive selector 38. The structures for larger constellations can be generated in a similar manner, 
as will be apparent to those skilled in the art. 

1.2 Minimal Operations 
The techniques of the invention as described thus far allow not only considerable reductions 
5 in computational complexity, but are also well suited for pipelined implementations. In some 
appUcations, to be described in detail below, it is of importance to compute an entire set of possible 
outcomes when multiplying a symbol from a limited alphabet size with a complex value w,. In this 
case, the complexity can be reduced fiuther since many intermediate results can be re-used. 

Taking the first quadrant for the 16-QAM constellation of FIG. 7 as an example, the four 
10 possible coefficients are 

A+jB = Wji+jWj, (11) 
C+jD = SWa + SjWj, (12) 

15 

E+jF = 2Wi^-Wj+j{2wj + Wg), (13) 

G+jH = 2wR + Wji+jX2w,-M'ii). (14) 

20 The operation in the first line is free, the second line costs two adds and so does the third and fourth. 
Since all other values can be derived from multiplymg several times hyj, the values are obtained by 
flipping the real and imaginary values and inverting them. Since only eight different values (A-H) 
are involved, an additional eight inverters are required. Thus, the complete cost is 6 adders and 8 
inverters, or equivalently, 14 add/sub operations. Similarly, if this method is apphed to a 64-QAM 
25 constellation, 36 add/sub operations plus 32 inverters are required. 

2. Implementation Examples 
It will now be described how the above-described techniques can be used to implement FIR 
filters, LMS algorithms and ML sequence detection using the Viterbi algorithm. 
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2.1 FIR Filter 

FIG. 10 shows a table comparing the number of operations required in an FIR filter 
implemented using the techniques of Section 1 . 1 with that of a conventional FIR filter, for different 
types of modulation. A conventional FIR filter withMcomplex valued coefficients requires 4Mreal 

5 multiplications and (4M - 2) real additions. The table of FIG. 10 shows the number of real add/sub 
operations required when applying the techniques of the invention. It can be seen from the table that, 
even for large constellations such as 1024-QAM, the techniques of the invention provide 
considerably lower complexity. 

FIG. 1 1 shows a possible implementation of an FIR filter chain using a concatenation of the 

1 0 64-QAM blocks of FIG. 9 for multipUcation. The concatenated 64-QAM blocks are denoted 40- 1 , 
40-2, ... 40-M. The outputs of the blocks 40-1, 40-2, ... 40-Mare combined in corresponding adders 
42-1, 42-2, ... 42-(M-l). Another possible approach is to combine all first stages of the 
multiplication operations first, then the second and finally the third stages. This approach can save 
mantissa length and thus chip area. Typical values of M are around three to 64 while 1024-QAM 

15 or smaller constellations are typically used. 

Note however that the concept of implementing FIR filters of high order generally requires 
a long chain of adders. Many applications can apply pipelining to this structure in order to increase 
throughput at the expense of increased latency and increased chip area due to additional registers. 
Also note that the new selector techniques of the invention generally require only a few bits of 

20 information to be stored for each symbol, e.g., two bits for QPSK, so that a pipelining 
implementation of the technique with its additional registers typically does not require significantly 
more area. 

It should also be noted that some appUcations, such as the LMS algorithm, do not allow for 
latency, ha this case, a hierarchical tree structure can be used to reduce the delay from (M- 1) to 
25 log2 M, where is the delay time of one adder. For implementing such a free structure, the basic 

selector structure of FIG. 3 may be preferable, although now 2 two's complement inverters will 
generally be required for each coefficient, i.e., 2M such inverters altogether. These two's 
complement inverters require not only cell area but also introduce a carry-ripple effect similar to that 
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of standard adders, leading to an additional delay. Structures which avoid this additional delay will 
now be described with reference to FIGS. 12, 13 and 14. 

FIG. 13 shows a basic add/sub structure for a fast FIR filter implementation suited to add or 
subtract two numbers A and B, i.e., to perform the operation Z=±A±B. For such an operation, 

5 either two add/sub structures or one add/sub with an additional inverter would generally be 
necessary. If, however, the sign signal is treated separately, just a pan of switches 50-1, 50-2 and 
one add/sub structure 52 is sufficient, as illustrated in FIG. 13. This results in a savings in area as 
well as latency. In order to realize the operation efficiently, the sign information, S(A) and S(B), for 
the two numbers, A and B, has to be provided. The result of the operation is a signed number Z with 

10 new sign information S(Z) and the add/sub selection: 

S{Z) = S{A)aS{B), (15) 



ADD /SUB = S{A)®S{B). (16) 

15 

The "wedge" operator A stands for the logical and function. FIG. 12 shows a table Usting the four 
possible operations in terms of the sign operators and the outgoing signal Z. Furthermore, a 
multiplexer is required to select the correct input in case of a subtraction (A-B,B- A). The number 
Z is passed along together with its sign S(Z) to the following adder structure. The first stage only 

20 requires an additional selector for the coefficients. For adding two numbers A and B only the first 
operation A + B is required. 

Treating the sign separately as described in FIGS. 12 and 13 also provides a number of 
additional advantages. For example, a QPSK constellation, as well as larger QAM constellations, 
can be implemented by assigning different values for the signs of A and B. The modulation is thus 

25 already part of the adder-tree. FIG. 14 shows a first stage of an adder tree for a fast FIR filter 
implementation, illustrating this featiire. The first stage of the tree includes switches 60- 1 , 60-2, 60- 
3 and 60-4, and add/sub units 62-1 and 62-2, interconnected as shown. 
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Another advantage of treating the sign separately is that the sign information can be passed 
along to the next add/sub stage in the adder tree and can thus be incorporated in a subsequent add 
operation without the need for additional inverters. At the end of the tree structure, one inverter 
might be necessary to provide final correction for the sign. It will be shown in the following section 
5 that the LMS algorithm can use this sign information immediately so that even this final inverter is 
not required. 

As described previously, the adder tree comprises add/sub structures as shown in FIG. 13, 
and the first stage can combine the selection of the coefficients and the add operations. This is 
shown in the FIG. 14 structure, which assumes that two complex coefficients A,^ +jAj, and +jBj 
10 are QPSK modulated by the corresponding bits {^0,^1} and {bo,b^}. The outgoing sign information 
of the output Zg + jZj is given by 

S{Z^) = {a,@a,)A{b,®b,) (17) 
s{Zj) = a,Ab, (18) 

15 

Subsequent adder stages are constructed in a straightforward manner in accordance with Equations 
15 and 16. 

2.2 LMS Algorithm 

The LMS algorithm is known to be of IM complexity, and generally requires 8M real 
20 multiphcations for a complex input. Its complexity is defined by two steps: 

= 4i) - «,w^ (19) 



the error equation with 

25 
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u, = [u(i),uii- l),.-.,u(i- M+ 1)] (20) 

and the coefficient update equation: 

5 w,^i = w, + //eO"K, (21) 

with 

w, = [w,(0),w,(l),...w,(M-l)] (22) 

10 

denoting W;(/) the tap weight at time instant i with index / ranging from 0 to M - 1 . If the technique 
as explained in the previous section is appHed, the multiphcations for the error e(/) computation can 
be substituted by select and add/sub operations. To compute the coefficient update (Equation 21), 
if the step-size ^ is a power of two (ix = 1% the multiplication /^e(i) can be replaced by a simple shift 

15 operator. The remaining multiplication with the symbols u, can also be achieved with the new 
technique. All that remains is to update the coefficient, which is a complex addition. 

FIG. 15 is a table showing the complexity and minimum latency of the LMS algorithm, as 
implemented using the techniques of the invention, for different modulation techniques. A QPSK 
constellation requires 2M add/sub operations to evaluate the error equation as well as updating the 

20 coefficient. For 1 6-QAM, the error computation requires 4M add/sub operations, the multiphcation 

e(i) U* requires 2M add/sub operations, and the coefficient update requires 2M add/ sub operations, 

i.e., 8M add/sub operations are required altogether. In summary, 4M, 8M and 12M add/sub 
operations are required for QPSK, 1 6-QAM and 64-QAM constellations, respectively. Thus when 
the constellation size is increased by four times, another 4M additional operations are required. 
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As mentioned in Section 2. 1 above, faster realizations are sometimes required. For the LMS 
algorithm, the update cannot be performed until the error signal is available. Thus, a very rapid FIR 
filter chain is important. This can be implemented using the hierarchical tree structure described in 
Section 2. 1 . FIG. 16 shows an exemplary implementation of this type, comprising add/sub units 70- 
5 1 and 70-2, selectors 72-1 through 72-4, and an add/sub hierarchical tree structure 74. The 

computation of the error signal in this case requires an additional subtraction: e{i) = d{i) - w . . 

If the order of the filter is chosen to be M= 2^-1, d{i) can be treated as one filter tap-weight element 
and does not require additional delay. 

As previously mentioned, depending on the sign of the last signal (in this case the error 

10 signal), a final two's complement inverter may be required. This would cost an additional delay 
since two's complement inverters cause carry-ripple effects. However, in the LMS algorithm, the 
last inverter can be incorporated into the selection process of the coefficient update, i.e., if the 
negative information is active, -fxe{i) is appUed rather than fj.e{i). 

The modified selection is faster because it is reaUzed by simple logical gates and does not 

15 require an adder structure. For QPSK, one complete update cycle requires Xog^ M for the FIR 
error part, possibly one for the error to multiply by the step-size ^u, and finally one add operation 
for the updates of the coefficients. Thus, the minimum update time is given by (log2 M+ 2)7^. If 
the step-size is a negative power of two, the multipHcation by tiie error is a simple scaling 
operation and can therefore be realized without any additional time delay. Other values of the 

20 step-size can be approximated with a sum of two such values, i.e., jj. = 2~^^ ±2 , so that one 
add/sub operation is sufficient for the multiplication with the error. This operation gives a wide 
range for possible step-size values. The possible additional operation is indicated with a dashed box 
75 in FIG. 16. 

As an example, assume that the LMS algorithm is apphed to train a channel estimator of 
25 order M= 1 5, and the technology used implements an add/sub operation in 1 ns and a multiplication 
operation in 6 ns. Thus, the whole update for a BPSK/QPSK training sequence can be performed 
in 6ns with add/sub operations while it takes about 12 ns when using multipliers. The required chip 
area and power consumption, on the other hand, might become ten times higher to implement the 

15 
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multipliers. Real-time processing is thus possible for up to 1/6 ns = 166 Msymbols-per-second in 
this case. 

2.3 ML Sequence Detection 

In a digital communication system that transmits information over a channel causing 

5 Inter-Symbol-Interference (ISI), the optimum detector is a maximum-likelihood symbol sequence 
detector (MLSD), as described in, e.g., J.G. Proakis, "Channel equalization," The Communications 
Handbook, CRC Press, 1997, Chapter 26; and G.D. Forney, Jr., "Maximum-likelihood sequence 
detection in the presence of intersymbol interference," IEEE Trans, on Information Theory, Vol. 
IT-18, pp. 363-378, May 1972. 

10 An efficient algorithm for implementing a MLSD is the Viterbi algorithm, which was 

originally devised for decoding convolutional codes. See the above-cited J.G. Proakis reference and 
G.D. Forney, Jr., 'The Viterbi algorithm," IEEE Proceedings, Vol. 61, pp. 268-278, March 1973. 
In this case, the ISI channel is modeled as a Finite-State Machine (FSM), called a trellis, with P^'^ 
states. Here P is the information symbol alphabet size and Mis the number of complex-valued 

1 5 channel FIR filter coefficients, W;. In the trelHs, there are P transitions divergmg from each state, 
corresponding to the P different values of the information symbol, uQC). The values associated with 
the transitions between the states are u{k), i.e., the possible received values, given the estimated 
channel coefficients w,. 

Assuming that the received sequence is r{k), the estimated channel coefficients are w,, and 

20 the input information symbols are u{k\ the Viterbi algorithm finds the most-likely transmitted 
symbol u{k) by recursively finding the path in the trelhs that is closest in Euclidean distance to the 
received noisy sequence rik). That is, it implements the ML detector criterion by recursively 
minimizing with respect to u{k) , 
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At each recursion step, the Viterbi algorithm searches over the P^possible transitions. If the channel 
model coefficient changes from one recursion to the other, for every one of the M estimated channel 
coefficients w^, P multipHcations with the P symbols of the symbol alphabet are required. Thus, a 
total of PM^ 0(PM) complex multiphcations are required to compute the various terms initially. 
5 In order to compute the cost metrics of the P^ transitions between the states in the FSM, another (M 
- 1) X = 0{MP^) complex additions are required. If this is done using a conventional method, 
a complex multiplication is performed with four to eight add/sub operations depending on the 
symbols. The value (3 + 7/), for example, requires one addition for the multiplication by three 
(3=1+2) and one subtraction for seven (7=8-1). This needs to be performed on the real as well as 

10 on the imaginary part of the coefficient, resulting in 0{PM) = 4PM Finally, the real and imaginary 
part needs to be added to r{k), which requires another 2 add operations. For all M coefficients and 
P^ states, there are thus a total of 0(MP^) = 2MP^ add operations. 

Using the techniques of the invention as described in Section LI above, complexity can be 
reduced considerably. For QPSK, multiplication of each coefficient with u(k) becomes a selection 

1 5 process, and 0(MP^) = 0. Only the computation of the transitions remains with 0(MP^) = 2MP^ 

to compute r{k) = r{k)- ^ ffiWju{k- l) in Equation 16. For 16-QAM, the first step is a 

selection process, the next is two add/subs to compute each multiplication per symbol, thus 0{PM) 
= 2PM= 32M, For 64-QAM, in each subset of four symbols 10 add/subs are required for computing 
the multiphcation, thus 0{PM) = 160M The application of this technique has the additional 
20 advantage that it can readily be pipelined without adding too much chip area for the additional 
registers. 

At the expense of increasing pipeUning complexity, the minimal operations described in 
Section 1.2 can reduce the computation complexity even fiirther. For example, for 16-QAM 
modulation, 0{PM) = HMand for 64-QAM, 0{PM) = 68M 
25 Initializing the trellis at the beginning of an equahzation process requires computing all 

possible multiplications with all the elements in the symbol alphabet once. This assumes that the 
channel remains constant over a fi*ame of data. If the channel is rapidly changing fi-om symbol to 
symbol, this initialization has to be performed at every recursion to update the transition values of 
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the trellis. The obtained values can be stored in a look-up table for computing the Euclidean norm 
in the Viterbi algorithm. FIG. 17 shows a table comparing the complexity for initiahzation (0{PM)) 
of the Section 1 . 1 and Section 1 .2 implementations with a conventional implementation, for different 
types of modulation. 

5 In order to obtain the complete complexity of the Viterbi algorithm, 0(MP^) needs to be 

added if a Ml search through the trelhs is apphed. This part can easily exceed the initial complexity 
of 0(PM). Reduced complexity techniques may be applied, such as reduced-state sequence 
estimation techniques that limit the search through the trellis, as described in, e.g., M. Eyuboglu and 
S. Qureshi, "Reduced-state sequence estimation for coded modulation on intersymbol interference 
10 channels," IEEE Journal on Selected Areas in Communications, Vol. 7, pp. 989-995, August 1989. 
In this case, the initial complexity can become very significant. Usmg a tree structure in accordance 
with the invention for implementing the add operations can further reduce the complexity as 
described in Section 1,1 above. 

3. Receiver Examples 

15 FIG. 18 shows an illustrative embodiment of a receiver 100 in which the above-described 

signal processing operations may be implemented. The receiver 1 00 receives a sequence of symbols 
transmitted by a transmitter 1 02. The symbols are generated by transmitter 1 02 in accordance with 
a rotated constellation as previously described. The receiver 100 includes a ML sequence detector 
(MLSD) 104 which implements the Viterbi algorithm, and is connected in parallel with an LMS 

20 estimator 106. One or both of the MLSD 104 and LMS estimator 106 are implemented using the 
computational techniques described above. The ch^inel estimates from the LMS estimator 106 are 
fed to the MLSD as shown. An optional channel decoder 108 is also included in the receiver 100. 

FIG. 19 shows another example receiver 120 in which the invention may be implemented. 
The receiver 120, which implements a decision feedback equahzation (DFE) technique, and 

25 includes the MLSD 104, LMS estimator 106 and optional channel decoder 108, and also includes 
a set of feed-forward filters 122 and a set of feedback filters 124. The output of the LMS estimator 
in this case is used to determine adaptive filter taps for the set of feedback filters 124. The received 
symbols are processed through feed-forward filters 122, and then applied to the MLSD 104, The 
output of the MLSD is fedback through the feedback filters 124. 

18 
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It should be noted that the arrangements of FIGS. 18 and 19 are examples only, and other 
embodiments may include different arrangements of elements and/or additional elements not 
exphcitly shown, 

4. Conclusion 

5 The invention provides efficient computational methods and apparatus that allow large 

reductions in complexity for implementations of communication algorithms that require multiplying 
a coefficient with a constellation point in a PSK or QAM constellation. Chip area as well as latency 
can be reduced due to the substitution of multiphcations by simpler fimctions. Note that applying 
this technique does not cause any approximation or reduction in precision, but simply removes 

10 area-intensive and time-intensive operations. 

It should be noted that the above-described illustrative embodiments may be implemented 
in hardware, software or combinations of hardware and software. For example, the computational 
structures illustrated in FIGS. 3-6, 8, 9, 11, 13, 14 and 16 may be implemented as elements of an 
apphcation-specific integrated circuit (ASIC) or other digital data processing device for use in a 

15 channel estimator, channel equaUzer, demodulator, decoder or other element of a digital 
communication system receiver. 

Although the illustrative embodiments of the present invention have been described herein 
with reference to the accompanying drawings, it is to be understood that the invention is not limited 
to those precise embodiments, and that various other changes and modifications may be affected 

20 therein by one skilled in the art without departing from the scope of the invention as set forth in the 
appended claims. 
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Claims 

What is claimed is: 

1 1 . A method of processing information in a receiver of a digital communication system, the 

2 method comprising the step of: 

3 applying a signal processing operation to a sequence of transmitted symbols, wherein 

4 the transmitted symbols correspond to points in a first modulation constellation corresponding to a 

5 rotated version of a second modulation constellation, and each of the transmitted symbols represents 

6 a particular number of information bits, and further wherein use of the first modulation constellation 

7 allows the signal processing operation to be performed using a reduced number of operations relative 

8 to the nxmiber of operations required in conjunction with the second modulation constellation. 

1 2. The method of claim 1 wherein use of the first modulation constellation allows the signal 

2 processing operation to be performed without multiplication operations. 



1 3 . The method of claim 1 wherein the first modulation constellation is generated by applying 

2 a 45 ° rotation to the second modulation constellation. 



1 4. The method of claim 1 wherein the second modulation constellation comprises one of a 

2 PSK constellation and a QAM constellation. 

1 5. The method of claim 1 wherein the signal processing operation comprises at least one of 

2 a finite impulse response (FIR) filtering operation, a Least-Mean-Squares (LMS) estimation 

3 operation, and a Maximum-Likelihood (ML) sequence detection operation using a Viterbi algorithm. 

1 6. The method of claim 1 wherein the signal processing operation utihzes a selector to 

2 implement a complex multiplication of a channel estimate coefficient with a symbol fi:-om the first 

3 modulation constellation. 
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1 7, The method of claim 6 wherein the selector receives as inputs real and imaginary parts 

2 of an element of the channel estimate coefficient, and generates as outputs real and imaginary parts 

3 of a product of the element of the channel estimate coefficient with a corresponding element of a 

4 given one of the symbols, without utihzing a multiplication operation. 

1 8. The method of claim 7 wherein the selector comprises first and second switches and first 

2 and second add/subtract xmit, the first and second switches each selecting one of the real or the 

3 imaginary part of the element of the channel estimate coefficient for application to a corresponding 

4 one of the add/subtract units, such that the add/subtract units compute elements of real and imaginary 

5 parts of an inner vector product. 

1 9, The method of claim 8 wherein an FIR filter operation is implemented using the selector 

2 by hicluding feedback fi'om outputs of the add/subtract units to corresponding inputs of the 

3 add/subtract xmits. 

1 10, The method of claim 1 wherein the signal processing operation comprises a multi-stage 

2 multiplication operation implemented without multiphcation operations, wherein each stage of the 

3 multi-stage operation corresponds to a selector, and a left shift element is arranged between an 

4 output of a given one of the stages and a corresponding input of a subsequent stage. 

1 11. The method of claim 1 wherein the signal processing operation is implemented utilizing 

2 a multi-stage hierarchical adder tree without multiphcation operations. 

1 1 2. An apparatus for use in processing information in a receiver of a digital communication 

2 system, the apparatus comprising; 

3 a signal processing circuit for processing a sequence of transmitted symbols, wherein 

4 the transmitted symbols correspond to points in a first modulation constellation corresponding to a 

5 rotated version of a second modulation constellation, and each of the transmitted symbols represents 

6 a particular number of information bits, and fiirther wherein use of the first modulation constellation 
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7 allows the signal processing operation to be performed using a reduced number of operations relative 

8 to the number of operations required in conjunction with the second modulation constellation, 

1 13. The apparatus of claim 12 wherein use of the first modulation constellation allows the 

2 signal processing operation to be performed without multiplication operations. 

1 14. The apparatus of claim 12 wherein the first modulation constellation is generated by 

2 applying a 45 ° rotation to the second modulation constellation. 

1 15. The apparatus of claim 1 2 wherein the other modulation constellation comprises one of 

2 a PSK constellation and a QAM constellation. 

1 16. The apparatus of claim 12 wherein the signal processing circuit comprises at least one 

2 of a finite impulse response (FIR) filter, a Least-Mean-Squares (LMS) estimator, and a Maximum- 

3 Likehhood (ML) sequence detector implemented using a Viterbi algorithm. 

1 17. The apparatus of claim 12 wherein the signal processing circuit comprises at least one 

2 selector operative to implement a complex multiplication of a channel estimate coefficient with a 

3 symbol firom the first modulation constellation. 

1 18. The apparatus of claim 17 wherein the selector receives as inputs real and imaginary 

2 parts of an element of the channel estimate coefficient, and generates as outputs real and imaginary 

3 parts of a product of the element of the channel estimate coefficient with a corresponding element 

4 of a given one of the symbols, without utilizing a multiplication operation. 

1 19. The apparatus of claim 1 8 wherein the selector comprises first and second switches and 

2 first and second add/subtract unit, the first and second switches each selecting one of the real or the 

3 imaginary part of the element of the channel estimate coefficient for application to a corresponding 
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4 one of the add/subtract units, such that the add/subtract units compute elements of real and imaginary 

5 parts of an inner vector product. 

1 20. The apparatus of claim 1 9 wherein the signal processing circuit comprises an FIR filter 

2 implemented using the selector configured with feedback from outputs of the add/subtract units to 

3 corresponding inputs of the add/subtract units. 

1 21 . The apparatus of claim 12 wherein the signal processing circuit comprises a multi-stage 

2 circuit implemented without multiphcation operations, wherein each stage of the multi-stage circuit 

3 corresponds to a selector, and a left shift element is arranged between an output of a given one of 

4 the stages and a corresponding input of a subsequent stage. 

1 22. The apparatus of claim 12 wherein the signal processing circuit is implemented utilizing 

2 a multi-stage hierarchical adder tree without multiphcation operations. 

1 23 . An apparatus for use in processing information in a receiver of a digital communication 

2 system, the apparatus comprising: 

3 signal processing means for applying a signal processing operation to a sequence of 

4 transmitted symbols, wherein the transmitted symbols correspond to points in a first modulation 

5 constellation corresponding to a rotated version of a second modulation constellation, and each of 

6 the transmitted symbols represents a particular number of information bits, and further wherein use 

7 of the first modulation constellation allows the signal processing operation to be performed using 

8 a reduced number of operations relative to the number of operations required in conjunction with 

9 the second modulation constellation. 

1 24. A method of processing information in a transmitter of a digital communication system, 

2 the method comprising the step of: 

3 generating a sequence of transmitted symbols, wherein the transmitted symbols 

4 correspond to points in a first modulation constellation generated by applying a predetermined 
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5 rotation to a second modulation constellation, and each of the transmitted symbols represents a 

6 particular number of information bits, and further wherein use of the first modulation constellation 

7 allows a signal processing operation in a corresponding receiver of the system to be performed using 

8 a reduced number of operations relative to the number of operations required in conjunction with 

9 the second modulation constellation. 

1 25. An apparatus for use in processing information in a transmitter of a digital 

2 communication system, the apparatus comprising: 

3 means for generating a sequence of transmitted symbols, wherein the transmitted 

4 symbols correspond to points in a first modulation constellation representative of a rotated version 

5 of a second modulation constellation, and each of the transmitted symbols represents a particular 

6 number of information bits, and fiirther wherein use of the first modulation constellation allows a 

7 signal processing operation in a corresponding receiver of the system to be performed using a 

8 reduced number of operations relative to the number of operations required in conjunction with the 

9 second modulation constellation. 



24 



% 



Lou 13-13 

Abstract 

Signal processing operations are performed in a digital communication system receiver on 
a sequence of received symbols, each representing a number of information bits. The symbols 
correspond to points in a given modulation constellation generated by applying a predetermined 

5 rotation, e.g., a 45 ° rotation, to an otherwise conventional modulation constellation, e.g., a QPSK 
constellation, a 1 6-QAM constellation, etc. The use of the rotated constellation allows certain signal 
processing operations, such as filtering, Least-Mean-Squares (LMS) estimation, and Maximum- 
Likelihood (ML) sequence detection via the Viterbi algorithm, to be performed without the need for 
multipUers. By eliminating or substantially reducing the number of required multiphcation 

10 operations, the invention significantly reduces the complexity and delay associated with the 
corresponding signal processing circuitry. Advantageously, this reduction in complexity and delay 
is accomphshed without the use of any approximation or other reduction in precision. 
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